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DERIVATION OF FAST DCT ALGORITHMS USING ALGEBRAIC 
TECHNIQUE BASED ON GALOIS THEORY^ 

MAXIM VASHKEVICH* AND ALEXANDER PETROVSKY* 

Abstract. The paper presents an algebraic technique for derivation of fast discrete cosine 
transform (DCT) algorithms. The technique is based on the algebraic signal processing theory 
(ASP). In ASP a DCT associates with a polynomial algebra Ac = C[x]/p(x). A fast algorithm is 
obtained as a stepwise decomposition of Ac ■ In order to reveal the connection between derivation of 
1 fast DCT algorithms and Galois theory we define A over the field of rational numbers Q instead of 

complex C. The decomposition of An requires the extension of the base field Q to splitting field E 
of polynomial p(x). Galois theory is used to find intermediate subfields Lj in which polynomial p(x) 
, is factored. Based on this factorization fast DCT algorithm is derived. 

Key words. Discrete cosine transform, DCT, polynomial transform, fast algorithm, Galois 
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1. Introduction. The DCT [TT] is the most widely used transforms in video and 
image processing, especially in coding application [2]. Different algebraic structures 
of DCT's have been investigated by several authors pH E] in order to obtain bounds 

^ | on the arithmetic complexity of their computations [6] and derive fast algorithms [3] . 

O ■ The most general and comprehensive theory that explains the relation between DCT 

and algebraic structures has been presented in [7]. Subsequently this theory has 
been called algebraic signal processing theory (ASP) [5]. A characteristic feature of 
^ | ASP is that underlying signal model is considered instead of a transformation matrix. 

The notion of signal model implies a triple (A,M,$), where Ac = C[x]/p(x) is an 
algebra of filters, M is an ^-module of signals (or signal space) and $ is a generalized 
concept of z-transform. Here and below we use subscript to indicate the base field of 
the polynomial algebra where it is necessary. 

The correspondence between a polynomial algebra and DCT is a key point of 
^vq \ ASP. A fast transform algorithm is obtained as a stepwise decomposition of Ac, that 

requires a stepwise factorization of p(x) [9] . We propose to define A over the field of 
rational numbers Q. Note that except trivial cases p(x) cannot be factored over Q 
and therefore we need to extend the base field to splitting field E of p(x). 

The stepwise factorization of p(x) and the extension of based field Q arc carried 
out using Galois theory. The theory allows us to find a set of intermediate subfields 
hi between Q and E. In each Lj the polynomial p{x) is stepwise factored. The 
factorization results in a fast DCT algorithm. 

As an application of the proposed technique we derive a fast recursive algorithm 
for DCT-4 n , where n = 2 k . 

2. Algebraic background. An algebra is a vector space A over some base field 
F in which multiplication of elements is defined and satisfies distributivity law. For 
instance complex numbers, quaternions and Q[x] (set of polynomials with rational 
coefficients) are algebras. 

The basic notion of ASP is polynomial algebra Ac — C[x]/p(x). If p(x) is poly- 
nomial of degree n then <C[x]/p(x) — {q(x) \ deg(q) < n} is a set of polynomials of 
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degree smaller then n with addition and multiplication modulo p{x). 

Polynomial algebra C[x]/p(x) can be decomposed into a direct sum of irreducible 
subalgebras C[x]/(x — ctk) using the Chinese remainder theorem (CRT): 

T : C[x}/p(x)^ C[x]/(x-a k ). 

Q<k<n 

under assumption that zeros a = (cx.q, . . . , a n _i) of p(x) are pairwise distinct. 
The operator T can be expressed in the following matrix form 

J 7 = Vb,a = [p«(a*)]o<M<*» 

where b — (poj • • • >Pn-i) is a basis in C[x]/p(x). Eq. (|2.ip assumes that in each 
C[x]/(x — a*;) the unit bases is set (x ) = (1). is a polynomial transform for „4c 
with basis 6 [5] . A scaled polynomial transform is obtained for a different basis ft in 
each C[x]/ (x — o^): 

J" = diag(l/ft, . . . , l/A,-i) • n,„. (2.2) 

2.1. Algebraic derivation of fast transform algorithm in ASP. As stated 
above fast transform algorithm in ASP is derived as a stepwise decomposition of 
underlying algebra C[x]/p(x) into a direct sum of one-dimensional algebras. Let 
p(x) = q(x) ■ r(x), k — deg(q) and m = deg(r) then 

C[x]/p(x) 

-> C[x]/q(x) ® C[x]/r(x) (2.3) 
C[x]/(x-A)® C[x]/(x- 7j ) (2.4) 

0<i<fc 0<j<m 

-> C[x]/(x-a,) (2.5) 

0<i<n 

where ft and 7j are the zeros of g(x) and r(x) correspondingly. If c and d are the 
bases of C[x]/q(x) and C[x]/r(x), respectively, then (|2.3p - (|2.5|) are expressed in the 
following matrix form [8]: 

Vb,a = ^(^,,3 © Pd, 7 )B, (2.6) 

where A(B B — [ A B ] denotes the direct sum of matrices. The matrix B maps the 
basis b to concatenation of the bases (c, d) and corresponds to (|2.3|) . Eq. (|2.4[) uses 
the CRT to decompose C[x]/q(x) and C[x]/r(x). This step corresponds to the direct 
sum of matrices V c ,p and Vd.-y- Finally permutation matrix P maps the concatenation 
(/3, 7) to the ordered list of zeros a in (|2.5I) . Given that B is sparse (|2.6[) leads to a 
fast algorithm. 

2.2. Polynomial algebra for DCT-4. Let us consider the polynomial algebra 
associated with widely used DCT-4 n 

Ac = C[x]/2T n (x), b=(V Q ,...,V n ^), (2.7) 

where T and V are Chebyshev polynomials of the first and third kind, respectively. 
Chebyshev polynomials have the following closed form expressions (cos# = x) 



T n (x) = cos(n#), V n (x) — cos(n + j)6 / cos \Q. 
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a k = cos(fc + < k < n are zeros of 2T n (x). Thus, according to (|2.1[) polynomial 
transform for algebra (|2.7[) is defined as 



V a .b = [Ve(ak)]o<k,t<n 



co S (fc+i)(£+|)| 



cos(fc + -i 



2 > 2n 



(2.8) 



In order to get the matrix of DCT-4„ (|2.8|) is multiplied from the left by scaling 
diagonal matrix 



D { n C%) = diag < fe< „ (cos(fc + (2.9) 



DGT-4 n = [cos(fc + |)^] n . (2.10) 



that yields 



Eq. (I2.8[) - (I2.10[) show that DCT-4 is a scaled polynomial transform of the form 
for the specified polynomial algebra (|2.7p . 

Polynomial transform corresponding to discrete trigonometric transform (DTT) 



is denoted as DTT (for instance DCT-4„ stands for the matrix in (|2.8jl ). 

3. Changing base field of polynomial algebra. Special feature of the poly- 
nomial algebras Ac — C[x]/p(x) associated with DCT is that all coefficients of poly- 
nomial p(x) are integer [8]. Thus the base field of Ac can be changed to the field of 
rational numbers Q: 

Aq = Q[x]/p(x). (3.1) 

In the general case complete factorization of p(x) requires an extension field Q. 

3.1. Field extensions and splitting field. A field E is an extension field of 
the field F if F C E. The extension field [E : F] is a splitting field of the polynomial 
p(x) over F if F C E, p(x) splits over E and E is generated by the roots of p(x). 

Example. Q[\/2] is the extension field of p(x) = x 2 - 2. The field Q[\/2] is 
generated by the adjunction of the element \/2 to Q. 

One-to-one mapping from E onto E is referred to as automorphism. Let E be an 
extension field of F, then F-automorphism of E is an automorphism of E such that 
<j>(x) = x, \/x € F. The group of such automorphisms is denoted by AutE/F. 

Example. Let us define the function /: Q[v2] — > Q[v2] such that 

f(a + bV2) =a- bV2, (3.2) 

then / is Q-automorphism of the field Q[V2]- 

3.2. Foundations of Galois theory. Galois theory can be used for finding 
extensions of Q needed to factor p(x). This extensions are defined by corresponding 
subgroups of Galois group of p(x). 

Let us consider a polynomial p(x) € ¥[x] with distinct roots. If E is a splitting 
field of p{x) then the group AutE/F is called Galois group of p{x) and denoted by 
Gal(E/F) or Gal p . 

Example. The splitting field of polynomial p{x) = x 2 — 2 is Q[a/2]. Galois group 
of p(x) consists of two elements Gal(Q[\/2]/Q]) = {/>e}, where / is given in (|3.2p . 
e is the identity element of the group e(a + by/2) = a + b\/2 and / • / = e. Thus 
Gal(Q[v / 2]/Q) is a cyclic group of order 2. 
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The main notion that is used below is Galois correspondence between intermediate 
fields of field extension [E: F] and subgroups of its Galois group. Each subgroup 
H £ Gal(E/F) corresponds to a subfield LcE, which is given by 

L = {x e E I <p(x) = x, V</> e H} . 

The fundamental theorem of Galois theory states that any tower of fields 

F = L C Li C • • • C L r = E (3.3) 

corresponds to a normal series of subgroups of Galois group Gal(E/F) in reverse order 

Gal(E/F) = Go D Gi D ■ ■ ■ D G r = {!}■ (3.4) 

This bijection between subfields Lj and subgroups Gi is called Galois correspondence. 

3.3. Using Galois correspondence for derivation of fast transform algo- 
rithm. For an irreducible polynomial p(x) £ F[x] with a splitting field E the tower 
of fields (|3.3|) can be obtained using Galois correspondence. p{x) is decomposed into 
factors of a lower degree in each subsequent field Li and finally represented as a prod- 
uct of linear factors p(x) — Y\i=Q(x — an) in E. Considering (|2.6[) the process results 
in a fast algorithm. 

4. Derivation of fast DCT-4 2 t algorithm. In this section foregoing theoretical 
notions are applied to the derivation of fast recursive DCT-4 2 t algorithm. Polynomial 
algebra associated with DCT-4 2 fc is defined as 

A® = Q[x]/2T 2 h(x). 

Let us define Galois group Gal 2 y k of 2T 2 k(x). Gal 2 ^ k is a cyclic groups of order 2 k 
(Z 2 <): 

Gal 2 T 2fc = Z 2fc . (4.1) 
The last equation leads us to the following normal series of subgroups of Gal 2 T 2fc : 

Gal 2 T 2fc = Z 2k D Z 2 fc-i D ■ ■ ■ D Z 2 D {1}, (4.2) 
and corresponding tower of field 



c Q [V2] c • • • c 



2 + ... + V2 

v ' 

k 



(4.3) 



Using (|4.1[) - (14.3[) the factorization of 2T 2 k(x) can be expressed as a recursive 
formula. In order to get its general form let us consider the special case k = 2. The 
zeros of 2T±(x) are at — cos(^ + 5)7, t = 0, ...,3: 

7r 7n 1 



VI t n -I / 1= 

a = cos — = — a 3 = — cos — = | V 2 + v 2, 



3vT 5vT -1 r- 7= 

a\ = cos — = — a 2 = — cos — = 2 — V2. 
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Table 4.1 
Galois group Gal2T 4 
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The splitting field of 2T 4 (x) is 
is expressed as 



V2 + V2 



In this case any element 6 of the field 



9 = u + Vy/2 + wy/2 - y/2 + qy/2 + y/2, 

where u, v,w,q £ Q. The field is a four-dimensional vector space over Q. 

Group AutQ \/2 + V2 /Q is the Galois group of 2T 4 (x). Its elements are the 



following automorphisms: 



a (6) = u + 


vy/2-\- 


wy/2- 


V2 


a x {6) =u- 


vy/2- 


wy/2 ■+ 


V2 


<j 2 {9)=u- 




un/2-f 


y/2 


<T 3 (8)=U + 




wy/2- 


V2 



qy/2 + y/2, 
qy/2 - V2, 
q\/2 - y/2, 
qy/2 + y/2. 



The Galois group Gal2T 4 is presented in table 14.11 

The table shows that H e {0-0,0-3} is a subgroup of Gal 2 T 4 , Gal 2 T 4 — Z 4 and 
H = Z 2 . The subgroup H determines the intermediate subficld Qh 



Qh = {0 e Q [y/2 + V2\ I g{9) = 6 Mg e H}. 
It is obvious that Qh — Q[\/2] and by Galois correspondence 
Z 4 D Z 2 D {1} => Q C Q [y/2\ C 
The factorization of 2T 4 is given below: 
Q: 2Ti{x), 
Q[y/2] : (2T 2 (x) - y/2){2T 2 {x) + y/2) 



y/2 + y/2 



Q[v / ^+71] : (2T 1 (x) + y/2 + y/2)(2T 1 (x) - y/2 + V2) 
(2T 1 (x) + y/2 - y/2)(2T 1 (x) - y/2^/2). 

4.1. Recursive factorization of 2T 2 k. From the special case it can be induced 
that T 2 fc is factored onto polynomials of the form 2T n — 2 cos rir and the factorization 
process is expressed by the following recursive formula: 



2T 2k (x) - 2cosr7r = (2T 2k -i(x) 



2 cos If) (2T 2 *-i (1) - 2 costt(1 - §)) , (4.4) 



r7r 
2 



that can be proved using the closed form of T 2 k . 

The formula defines factorization of 2T 2 k(x) when r = 1/2. Because of 2 cos 
y/2 + 2 cos tit the left and right sides of (|4.4[) have different base fields: Q[2cosr7r] 
and Q[y/2 + 2 cosr7r] respectively. Equation (14.4)) provides successive transition from 
Q to E in accordance with (|4.3|) . 
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4.2. Recursive fast DCT-4 2 f= algorithm. In this section a fast recursive DCT-4 2 fc 
algorithm is derived. The starting point is the polynomial algebrgQ 

Ajcoe™ = Qco S rA x }/( 2T 2"{x) - 2cosr?r) 

with the basis 6 = (Vq, . . . , V-jfc-i), that corresponds to skew DCT-4„(r) [5], where 
< r < 1. The conventional DCT-4„ is the special case of skew DCT-4„(r) for 
r = 1/2. 

Using (|4.4I) the algebra ^4q cob r7r can be decomposed into the direct sum of subal- 
gebras 

Qcosr7rN/(2T 2fc (x) - 2 COS T7r) 

-> Q cos - [x]/(2T 2h - 1 (x) - 2 cos r -f) © 

Q cos r ¥ [x]/(2T 2k -i(x) - 2cos7r(l - §)). (4.5) 

Considering ASP framework described in § 12.11 and choosing basis c ~ d = (Vb , . . . , 
V 2 k-i^i) in the subalgebras the following fast DCT-4 2 fc(r) algorithm is derived: 

DCT-4 2fc (r) — P ■ (DCT-4 2fc -i (§) © DCT-4 2 *-i(l - §)) • 5^ 4) , (4.6) 

where P is a permutation matrix of the form 

" l 

h 



h 

l 

and B^ 4 ^ is the change of basis matrix from the basis b to the concatenation (c,d). 
The elements Vi € b for < I < m — 2 k ~ 1 are actually contained in c and d. V m +i 
are expressed in the new basis 

V m+l = -V m -i-i+2oaa^fV t mod 2T m - 2 cos 

U m+ , = -y m _^_i - 2 cos ^ mod 2T m - 2 cos tt(i - §). ' " ' ' 

The equation can be induced from 2T m = V m + V m -i and recurrence for Chebyshev 
polynomial V n = 2xV n -\ — V n -%. Given (|4.7[) 



al m \ ' 

where I rn and J TO is identity and reverse identity m x m matrices and a — 2 cos ^ . 

Finally the fast algorithm of conventional DCT-4 is obtained by multiplying (|4.6I) 
from the left by matrix Dl C4 ' > defined in (12.91) . 

5. Discussion. The derived fast algorithm is a special case of the algorithm 
obtained in [5J see eq. (48)]. It should be noted that above mentioned algorithm is 
based on the decomposition property T^ m = Tfc(T m ) of Chebyshev polynomial. We 
have shown that the same algorithm can be obtained in a different way by using the 

1 Here Q cosr ir is used as a short notation for field extension Q[cosr7r]. 



B, 



(C4) 



( &Im Jm) 
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factorization (14.41) . An important result is the establishment of connection between 
Galois theory and the structure of fast DCT algorithm. 

Scaled version of obtained n-point fast DCT-4 algorithm (were n = 2 k ) requires 
5 log 2 n multiplications and 4p log 2 n additions. In order to scale the outputs n extra 
multiplications are needed. Note that in video and image coding applications the 
output of the DCT can usually be scaled since it will be followed by a quantizer that 
can take this scaling into account. 

A scaled version of 8-point fast DCT-2 that requires only 5 multiplication can 
be obtained using the proposed fast DCT-4 algorithm. Algorithm with the same 
arithmetic complexity is obtained in pQ. However, using the proposed DCT-4 algo- 
rithm the method of derivation scaled versions of fast DCT-2 2 k algorithms with low 
multiplicative complexity is generalized (see the appendix for detail). 

Another interesting application of proposed algorithm is development of error-free 
computational algorithms of fast DCT. In [3] a method of error-free computation of 
fast DCT based on Arai algorithm with algebraic integer (AI) encoding is presented. 
The proposed algebraic technique allows to express the quantity in each node of graph 
of DCT algorithm as vector over Q that is crucial step for AI encoding technique. 
Therefore combining proposed DCT algorithms and AI encoding technique new error- 
free computational algorithm of DCT-2 with power of two size can be derived. 

The proposed recursive algorithm is also well suited for development of new 
parallel-pipeline architecture of DCT processor. It can be used by automatic code 
generation programs that search alternative implementations for the same transform 
to find the one that is best fitted to the desired platform [12]. 

6. Summary. An algebraic technique of derivation of fast DCT algorithms is 
presented. The technique is based on ASP and Galois theory. The main idea be- 
hind the approach is to use the filed of rational numbers Q for a polynomial algebra 
associated with DCT. The fast DCT algorithm is derived as a result of stepwise de- 
composition of the polynomial algebra and requires the extension of the base filed. 
The extension is determined using Galois theory. The proposed technique is applied 
to the derivation of a fast DCT-4 2 fc algorithm. 

Appendix A. Scaled DCT-2 2 fc algorithm. DCT-2„ is arisen from polynomial 
algebra 

A® = Q[x]/(x - l)t/„-i(x), b = (V , . . . , V n -i), 

where U is Chebyshev polynomial of the second kind. Using the factorization 

U 2n -i(x) = U n -i{x) ■ 2T n (x), 

the algebra Q[x]/(a; — l)U 2n -i(x) can be decomposed as 

®[x]/{x-l)U 2 „-i(x) 
->■ Q[x]/(x - l)U n -!(x) © Q[x]/2T n (x), (A.l) 

that according to (|2.3j) - (|2.5j) leads to the following fast algorithm [8] 

DCT^2 2n - L 2 n n (DUT3 n 8 DCTl n )B 2n , (A.2) 

where L^ 1 is the stride permutation matrix and B 2n is change of basis matrix. B 2n 
maps basis b to the concatenation (c, d), where c = d = (Vq, . . . V n -i) are the basis 
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for subalgebras in the right-hand side of (|A.1[) . The first n columns of B 2n are 



Bin = 



In 
In 



since the elements Vg € b for < £ < n are already contained in c and d. The rest 
entries are determined by the following expressions 



V„+e = Vn-e-i mod (x - l)U n 
V n+ i = - V n -t-\ mod 2T n , 



(A.3) 
(A.4) 



which yields 



B 



2 1 1 



'I it 



(|A.3|) (| A.4|) can be induced using the following relation 2T„ = V n + V n -\, (x — 
l)Z7 n _i = V n — V n -\ and V n — 2xV n _\ — V n -z- Note that decomposition (|A.1[) does 
not require extension of based field Q. This leads to multiplication-free change of 
basis matrix £?2ri- 

When the size of DCT-2„ is power of two (|A.2|) can be applied recursively to 
obtain fast algorithm. Joint use of factorizations (|A.2|) and (|4.6j) leads to the scaled 
recursive algorithm. For n 



fast DCT-2 
plication. 



16 this algorithm requires only 17 multi- 
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